Overview
Brought to you by YData
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 424 | 430 |
| Missing cells (%) | 7.9% | 8.0% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Fare is highly overall correlated with Pclass | Alert not present in this dataset | High correlation |
Pclass is highly overall correlated with Fare | Alert not present in this dataset | High correlation |
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High correlation |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High correlation |
Age has 86 (19.3%) missing values | Age has 85 (19.1%) missing values | Missing |
Cabin has 337 (75.6%) missing values | Cabin has 344 (77.1%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 308 (69.1%) zeros | SibSp has 308 (69.1%) zeros | Zeros |
Parch has 342 (76.7%) zeros | Parch has 344 (77.1%) zeros | Zeros |
Fare has 7 (1.6%) zeros | Fare has 12 (2.7%) zeros | Zeros |
| Alert not present in this dataset | Parch is highly overall correlated with SibSp | High correlation |
| Alert not present in this dataset | SibSp is highly overall correlated with Parch | High correlation |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2024-10-29 15:28:38.892254 | 2024-10-29 15:28:42.099594 |
| Analysis finished | 2024-10-29 15:28:42.096230 | 2024-10-29 15:28:45.308858 |
| Duration | 3.2 seconds | 3.21 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 437.9574 | 441.18834 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 2 | 1 |
| Maximum | 890 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 2 | 1 |
| 5-th percentile | 41.75 | 51.25 |
| Q1 | 221.25 | 216.25 |
| median | 437.5 | 436.5 |
| Q3 | 651.75 | 676.75 |
| 95-th percentile | 849.5 | 837.25 |
| Maximum | 890 | 891 |
| Range | 888 | 890 |
| Interquartile range (IQR) | 430.5 | 460.5 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 255.59131 | 256.13092 |
| Coefficient of variation (CV) | 0.58359856 | 0.5805478 |
| Kurtosis | -1.1571442 | -1.2124609 |
| Mean | 437.9574 | 441.18834 |
| Median Absolute Deviation (MAD) | 216 | 228.5 |
| Skewness | 0.059607537 | 0.055496879 |
| Sum | 195329 | 196770 |
| Variance | 65326.917 | 65603.048 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 789 | 1 | 0.2% |
| 150 | 1 | 0.2% |
| 307 | 1 | 0.2% |
| 503 | 1 | 0.2% |
| 282 | 1 | 0.2% |
| 485 | 1 | 0.2% |
| 230 | 1 | 0.2% |
| 663 | 1 | 0.2% |
| 436 | 1 | 0.2% |
| 728 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 814 | 1 | 0.2% |
| 264 | 1 | 0.2% |
| 687 | 1 | 0.2% |
| 499 | 1 | 0.2% |
| 59 | 1 | 0.2% |
| 812 | 1 | 0.2% |
| 746 | 1 | 0.2% |
| 492 | 1 | 0.2% |
| 131 | 1 | 0.2% |
| 838 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 17 | 1 | |
| 18 | 1 | |
| 19 | 1 | |
| 23 | 1 | |
| 24 | 1 | |
| 25 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 7 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 15 | 1 | |
| 16 | 1 | |
| 21 | 1 | |
| 24 | 1 | |
| 25 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 7 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 15 | 1 | |
| 16 | 1 | |
| 21 | 1 | |
| 24 | 1 | |
| 25 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 17 | 1 | |
| 18 | 1 | |
| 19 | 1 | |
| 23 | 1 | |
| 24 | 1 | |
| 25 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 0 |
| 2nd row | 1 | 0 |
| 3rd row | 0 | 0 |
| 4th row | 0 | 1 |
| 5th row | 1 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 268 | |
| 1 | 178 |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 150 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 268 | |
| 1 | 178 |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 150 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 268 | |
| 1 | 178 |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 150 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 268 | |
| 1 | 178 |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 150 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 268 | |
| 1 | 178 |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 150 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 268 | |
| 1 | 178 |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 150 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 2 | 1 |
| 2nd row | 1 | 3 |
| 3rd row | 3 | 1 |
| 4th row | 3 | 2 |
| 5th row | 1 | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 115 | |
| 2 | 95 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 115 | |
| 2 | 95 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 115 | |
| 2 | 95 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 115 | |
| 2 | 95 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 115 | |
| 2 | 95 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 115 | |
| 2 | 95 |
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 104 | |
| 2 | 90 | 20.2% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 67 | 61 |
| Median length | 50 | 46 |
| Mean length | 26.829596 | 26.078475 |
| Min length | 12 | 12 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Byles, Rev. Thomas Roussel Davids | Harrison, Mr. William |
| 2nd row | Fleming, Miss. Margaret | Panula, Mr. Jaako Arnold |
| 3rd row | O'Sullivan, Miss. Bridget Mary | Allison, Mrs. Hudson J C (Bessie Waldo Daniels) |
| 4th row | Olsson, Mr. Nils Johan Goransson | West, Miss. Constance Mirium |
| 5th row | Bishop, Mr. Dickinson H | Lester, Mr. James |
| Value | Count | Frequency (%) |
| mr | 263 | 14.6% |
| miss | 94 | 5.2% |
| mrs | 61 | 3.4% |
| william | 27 | 1.5% |
| john | 16 | 0.9% |
| master | 16 | 0.9% |
| james | 15 | 0.8% |
| charles | 14 | 0.8% |
| henry | 14 | 0.8% |
| george | 13 | 0.7% |
| Other values (889) | 1263 |
| Value | Count | Frequency (%) |
| mr | 274 | 15.6% |
| miss | 90 | 5.1% |
| mrs | 50 | 2.8% |
| william | 31 | 1.8% |
| master | 22 | 1.2% |
| john | 20 | 1.1% |
| henry | 18 | 1.0% |
| james | 15 | 0.9% |
| george | 11 | 0.6% |
| mary | 11 | 0.6% |
| Other values (882) | 1220 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1351 | 11.3% | |
| r | 984 | 8.2% |
| e | 868 | 7.3% |
| a | 812 | 6.8% |
| i | 683 | 5.7% |
| s | 654 | 5.5% |
| n | 650 | 5.4% |
| M | 570 | 4.8% |
| l | 521 | 4.4% |
| o | 482 | 4.0% |
| Other values (49) | 4391 |
| Value | Count | Frequency (%) |
| 1318 | 11.3% | |
| r | 970 | 8.3% |
| e | 838 | 7.2% |
| a | 810 | 7.0% |
| n | 664 | 5.7% |
| s | 638 | 5.5% |
| i | 628 | 5.4% |
| M | 555 | 4.8% |
| l | 492 | 4.2% |
| o | 473 | 4.1% |
| Other values (50) | 4245 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 11966 |
| Value | Count | Frequency (%) |
| (unknown) | 11631 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1351 | 11.3% | |
| r | 984 | 8.2% |
| e | 868 | 7.3% |
| a | 812 | 6.8% |
| i | 683 | 5.7% |
| s | 654 | 5.5% |
| n | 650 | 5.4% |
| M | 570 | 4.8% |
| l | 521 | 4.4% |
| o | 482 | 4.0% |
| Other values (49) | 4391 |
| Value | Count | Frequency (%) |
| 1318 | 11.3% | |
| r | 970 | 8.3% |
| e | 838 | 7.2% |
| a | 810 | 7.0% |
| n | 664 | 5.7% |
| s | 638 | 5.5% |
| i | 628 | 5.4% |
| M | 555 | 4.8% |
| l | 492 | 4.2% |
| o | 473 | 4.1% |
| Other values (50) | 4245 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 11966 |
| Value | Count | Frequency (%) |
| (unknown) | 11631 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1351 | 11.3% | |
| r | 984 | 8.2% |
| e | 868 | 7.3% |
| a | 812 | 6.8% |
| i | 683 | 5.7% |
| s | 654 | 5.5% |
| n | 650 | 5.4% |
| M | 570 | 4.8% |
| l | 521 | 4.4% |
| o | 482 | 4.0% |
| Other values (49) | 4391 |
| Value | Count | Frequency (%) |
| 1318 | 11.3% | |
| r | 970 | 8.3% |
| e | 838 | 7.2% |
| a | 810 | 7.0% |
| n | 664 | 5.7% |
| s | 638 | 5.5% |
| i | 628 | 5.4% |
| M | 555 | 4.8% |
| l | 492 | 4.2% |
| o | 473 | 4.1% |
| Other values (50) | 4245 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 11966 |
| Value | Count | Frequency (%) |
| (unknown) | 11631 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1351 | 11.3% | |
| r | 984 | 8.2% |
| e | 868 | 7.3% |
| a | 812 | 6.8% |
| i | 683 | 5.7% |
| s | 654 | 5.5% |
| n | 650 | 5.4% |
| M | 570 | 4.8% |
| l | 521 | 4.4% |
| o | 482 | 4.0% |
| Other values (49) | 4391 |
| Value | Count | Frequency (%) |
| 1318 | 11.3% | |
| r | 970 | 8.3% |
| e | 838 | 7.2% |
| a | 810 | 7.0% |
| n | 664 | 5.7% |
| s | 638 | 5.5% |
| i | 628 | 5.4% |
| M | 555 | 4.8% |
| l | 492 | 4.2% |
| o | 473 | 4.1% |
| Other values (50) | 4245 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.6995516 | 4.6412556 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | male |
| 2nd row | female | male |
| 3rd row | female | female |
| 4th row | male | female |
| 5th row | male | male |
Common Values
| Value | Count | Frequency (%) |
| male | 290 | |
| female | 156 |
| Value | Count | Frequency (%) |
| male | 303 | |
| female | 143 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 290 | |
| female | 156 |
| Value | Count | Frequency (%) |
| male | 303 | |
| female | 143 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 602 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 156 | 7.4% |
| Value | Count | Frequency (%) |
| e | 589 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 143 | 6.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2096 |
| Value | Count | Frequency (%) |
| (unknown) | 2070 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 602 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 156 | 7.4% |
| Value | Count | Frequency (%) |
| e | 589 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 143 | 6.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2096 |
| Value | Count | Frequency (%) |
| (unknown) | 2070 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 602 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 156 | 7.4% |
| Value | Count | Frequency (%) |
| e | 589 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 143 | 6.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2096 |
| Value | Count | Frequency (%) |
| (unknown) | 2070 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 602 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 156 | 7.4% |
| Value | Count | Frequency (%) |
| e | 589 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 143 | 6.9% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 79 | 74 |
| Distinct (%) | 21.9% | 20.5% |
| Missing | 86 | 85 |
| Missing (%) | 19.3% | 19.1% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 30.615056 | 29.611274 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.67 | 0.67 |
| Maximum | 80 | 71 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.67 | 0.67 |
| 5-th percentile | 4.95 | 4 |
| Q1 | 21 | 21 |
| median | 29 | 28 |
| Q3 | 40 | 38 |
| 95-th percentile | 58 | 55.5 |
| Maximum | 80 | 71 |
| Range | 79.33 | 70.33 |
| Interquartile range (IQR) | 19 | 17 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 15.08413 | 14.256499 |
| Coefficient of variation (CV) | 0.49270302 | 0.48145511 |
| Kurtosis | 0.088619624 | -0.02156589 |
| Mean | 30.615056 | 29.611274 |
| Median Absolute Deviation (MAD) | 9.5 | 8 |
| Skewness | 0.40014997 | 0.27754377 |
| Sum | 11021.42 | 10689.67 |
| Variance | 227.53099 | 203.24778 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 24 | 17 | 3.8% |
| 22 | 16 | 3.6% |
| 19 | 15 | 3.4% |
| 18 | 14 | 3.1% |
| 21 | 13 | 2.9% |
| 36 | 13 | 2.9% |
| 29 | 13 | 2.9% |
| 32 | 11 | 2.5% |
| 31 | 9 | 2.0% |
| 16 | 9 | 2.0% |
| Other values (69) | 230 | |
| (Missing) | 86 | 19.3% |
| Value | Count | Frequency (%) |
| 22 | 16 | 3.6% |
| 28 | 15 | 3.4% |
| 19 | 14 | 3.1% |
| 24 | 14 | 3.1% |
| 21 | 13 | 2.9% |
| 30 | 12 | 2.7% |
| 35 | 12 | 2.7% |
| 25 | 12 | 2.7% |
| 18 | 11 | 2.5% |
| 36 | 11 | 2.5% |
| Other values (64) | 231 | |
| (Missing) | 85 | 19.1% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 1 | 6 | |
| 2 | 4 | |
| 3 | 2 | 0.4% |
| 4 | 4 | |
| 5 | 3 | |
| 6 | 2 | 0.4% |
| 7 | 2 | 0.4% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 6 | |
| 3 | 3 | |
| 4 | 6 | |
| 5 | 3 | |
| 6 | 2 | 0.4% |
| 7 | 3 | |
| 8 | 2 | 0.4% |
| 9 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 6 | |
| 3 | 3 | |
| 4 | 6 | |
| 5 | 3 | |
| 6 | 2 | 0.4% |
| 7 | 3 | |
| 8 | 2 | 0.4% |
| 9 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 1 | 6 | |
| 2 | 4 | |
| 3 | 2 | 0.4% |
| 4 | 4 | |
| 5 | 3 | |
| 6 | 2 | 0.4% |
| 7 | 2 | 0.4% |
| 8 | 1 | 0.2% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.50672646 | 0.53811659 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 308 | 308 |
| Zeros (%) | 69.1% | 69.1% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 3 | 3 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.067897 | 1.1658551 |
| Coefficient of variation (CV) | 2.1074428 | 2.1665473 |
| Kurtosis | 17.962352 | 16.281306 |
| Mean | 0.50672646 | 0.53811659 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.6524325 | 3.604036 |
| Sum | 226 | 240 |
| Variance | 1.1404041 | 1.359218 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 101 | 22.6% |
| 2 | 13 | 2.9% |
| 3 | 11 | 2.5% |
| 4 | 8 | 1.8% |
| 8 | 3 | 0.7% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 100 | 22.4% |
| 4 | 12 | 2.7% |
| 2 | 12 | 2.7% |
| 3 | 7 | 1.6% |
| 8 | 4 | 0.9% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 101 | 22.6% |
| 2 | 13 | 2.9% |
| 3 | 11 | 2.5% |
| 4 | 8 | 1.8% |
| 5 | 2 | 0.4% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 100 | 22.4% |
| 2 | 12 | 2.7% |
| 3 | 7 | 1.6% |
| 4 | 12 | 2.7% |
| 5 | 3 | 0.7% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 100 | 22.4% |
| 2 | 12 | 2.7% |
| 3 | 7 | 1.6% |
| 4 | 12 | 2.7% |
| 5 | 3 | 0.7% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 101 | 22.6% |
| 2 | 13 | 2.9% |
| 3 | 11 | 2.5% |
| 4 | 8 | 1.8% |
| 5 | 2 | 0.4% |
| 8 | 3 | 0.7% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 6 |
| Distinct (%) | 1.3% | 1.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.36098655 | 0.35201794 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 5 |
| Zeros | 342 | 344 |
| Zeros (%) | 76.7% | 77.1% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 5 |
| Range | 5 | 5 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.75977697 | 0.74612585 |
| Coefficient of variation (CV) | 2.1047238 | 2.1195677 |
| Kurtosis | 8.4018259 | 7.465106 |
| Mean | 0.36098655 | 0.35201794 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.593079 | 2.5211801 |
| Sum | 161 | 157 |
| Variance | 0.57726105 | 0.55670378 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 342 | |
| 1 | 59 | 13.2% |
| 2 | 39 | 8.7% |
| 3 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 344 | |
| 1 | 59 | 13.2% |
| 2 | 36 | 8.1% |
| 3 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| 5 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 342 | |
| 1 | 59 | 13.2% |
| 2 | 39 | 8.7% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 344 | |
| 1 | 59 | 13.2% |
| 2 | 36 | 8.1% |
| 3 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| 5 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 344 | |
| 1 | 59 | 13.2% |
| 2 | 36 | 8.1% |
| 3 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| 5 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 342 | |
| 1 | 59 | 13.2% |
| 2 | 39 | 8.7% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 386 | 387 |
| Distinct (%) | 86.5% | 86.8% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.8654709 | 6.7511211 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 342 | 347 ? |
| Unique (%) | 76.7% | 77.8% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 244310 | 112059 |
| 2nd row | 17421 | 3101295 |
| 3rd row | 330909 | 113781 |
| 4th row | 347464 | C.A. 34651 |
| 5th row | 11967 | A/4 48871 |
| Value | Count | Frequency (%) |
| pc | 36 | 6.3% |
| c.a | 13 | 2.3% |
| a/5 | 9 | 1.6% |
| ston/o | 7 | 1.2% |
| 2 | 7 | 1.2% |
| sc/paris | 6 | 1.0% |
| soton/oq | 5 | 0.9% |
| ca | 5 | 0.9% |
| 4133 | 4 | 0.7% |
| 14879 | 4 | 0.7% |
| Other values (406) | 476 |
| Value | Count | Frequency (%) |
| pc | 25 | 4.4% |
| c.a | 15 | 2.6% |
| a/5 | 8 | 1.4% |
| ca | 7 | 1.2% |
| ston/o | 7 | 1.2% |
| 2 | 7 | 1.2% |
| 1601 | 5 | 0.9% |
| 3101295 | 5 | 0.9% |
| a/4 | 4 | 0.7% |
| 347082 | 4 | 0.7% |
| Other values (410) | 480 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 387 | |
| 1 | 362 | |
| 2 | 292 | |
| 7 | 249 | 8.1% |
| 4 | 222 | 7.3% |
| 6 | 210 | 6.9% |
| 5 | 202 | 6.6% |
| 0 | 195 | 6.4% |
| 9 | 174 | 5.7% |
| 8 | 130 | 4.2% |
| Other values (25) | 639 |
| Value | Count | Frequency (%) |
| 3 | 384 | |
| 1 | 329 | |
| 2 | 314 | |
| 7 | 246 | |
| 4 | 228 | 7.6% |
| 0 | 202 | 6.7% |
| 6 | 201 | 6.7% |
| 5 | 191 | 6.3% |
| 9 | 165 | 5.5% |
| 8 | 145 | 4.8% |
| Other values (25) | 606 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3062 |
| Value | Count | Frequency (%) |
| (unknown) | 3011 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 387 | |
| 1 | 362 | |
| 2 | 292 | |
| 7 | 249 | 8.1% |
| 4 | 222 | 7.3% |
| 6 | 210 | 6.9% |
| 5 | 202 | 6.6% |
| 0 | 195 | 6.4% |
| 9 | 174 | 5.7% |
| 8 | 130 | 4.2% |
| Other values (25) | 639 |
| Value | Count | Frequency (%) |
| 3 | 384 | |
| 1 | 329 | |
| 2 | 314 | |
| 7 | 246 | |
| 4 | 228 | 7.6% |
| 0 | 202 | 6.7% |
| 6 | 201 | 6.7% |
| 5 | 191 | 6.3% |
| 9 | 165 | 5.5% |
| 8 | 145 | 4.8% |
| Other values (25) | 606 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3062 |
| Value | Count | Frequency (%) |
| (unknown) | 3011 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 387 | |
| 1 | 362 | |
| 2 | 292 | |
| 7 | 249 | 8.1% |
| 4 | 222 | 7.3% |
| 6 | 210 | 6.9% |
| 5 | 202 | 6.6% |
| 0 | 195 | 6.4% |
| 9 | 174 | 5.7% |
| 8 | 130 | 4.2% |
| Other values (25) | 639 |
| Value | Count | Frequency (%) |
| 3 | 384 | |
| 1 | 329 | |
| 2 | 314 | |
| 7 | 246 | |
| 4 | 228 | 7.6% |
| 0 | 202 | 6.7% |
| 6 | 201 | 6.7% |
| 5 | 191 | 6.3% |
| 9 | 165 | 5.5% |
| 8 | 145 | 4.8% |
| Other values (25) | 606 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3062 |
| Value | Count | Frequency (%) |
| (unknown) | 3011 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 387 | |
| 1 | 362 | |
| 2 | 292 | |
| 7 | 249 | 8.1% |
| 4 | 222 | 7.3% |
| 6 | 210 | 6.9% |
| 5 | 202 | 6.6% |
| 0 | 195 | 6.4% |
| 9 | 174 | 5.7% |
| 8 | 130 | 4.2% |
| Other values (25) | 639 |
| Value | Count | Frequency (%) |
| 3 | 384 | |
| 1 | 329 | |
| 2 | 314 | |
| 7 | 246 | |
| 4 | 228 | 7.6% |
| 0 | 202 | 6.7% |
| 6 | 201 | 6.7% |
| 5 | 191 | 6.3% |
| 9 | 165 | 5.5% |
| 8 | 145 | 4.8% |
| Other values (25) | 606 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 181 | 189 |
| Distinct (%) | 40.6% | 42.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 31.619628 | 31.456362 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 263 | 512.3292 |
| Zeros | 7 | 12 |
| Zeros (%) | 1.6% | 2.7% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.2292 | 7.0542 |
| Q1 | 7.925 | 7.8958 |
| median | 14.45625 | 13.5 |
| Q3 | 31.3875 | 31.275 |
| 95-th percentile | 108.28125 | 110.8833 |
| Maximum | 263 | 512.3292 |
| Range | 263 | 512.3292 |
| Interquartile range (IQR) | 23.4625 | 23.3792 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 41.12656 | 48.13367 |
| Coefficient of variation (CV) | 1.3006655 | 1.5301728 |
| Kurtosis | 11.648125 | 29.223275 |
| Mean | 31.619628 | 31.456362 |
| Median Absolute Deviation (MAD) | 6.72915 | 6.2708 |
| Skewness | 3.0403662 | 4.4782418 |
| Sum | 14102.354 | 14029.537 |
| Variance | 1691.3939 | 2316.8502 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 8.05 | 25 | 5.6% |
| 13 | 24 | 5.4% |
| 7.8958 | 21 | 4.7% |
| 7.75 | 16 | 3.6% |
| 10.5 | 14 | 3.1% |
| 26 | 10 | 2.2% |
| 7.925 | 10 | 2.2% |
| 7.775 | 8 | 1.8% |
| 7.25 | 8 | 1.8% |
| 0 | 7 | 1.6% |
| Other values (171) | 303 |
| Value | Count | Frequency (%) |
| 8.05 | 23 | 5.2% |
| 7.75 | 21 | 4.7% |
| 7.8958 | 19 | 4.3% |
| 13 | 17 | 3.8% |
| 26 | 14 | 3.1% |
| 0 | 12 | 2.7% |
| 10.5 | 12 | 2.7% |
| 7.925 | 11 | 2.5% |
| 7.775 | 9 | 2.0% |
| 7.2292 | 8 | 1.8% |
| Other values (179) | 300 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
| 7.1417 | 1 | 0.2% |
| 7.225 | 6 | |
| 7.2292 | 5 | |
| 7.25 | 8 |
| Value | Count | Frequency (%) |
| 0 | 12 | |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 2 | 0.4% |
| 7.125 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 12 | |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 2 | 0.4% |
| 7.125 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
| 7.1417 | 1 | 0.2% |
| 7.225 | 6 | |
| 7.2292 | 5 | |
| 7.25 | 8 |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 91 | 87 |
| Distinct (%) | 83.5% | 85.3% |
| Missing | 337 | 344 |
| Missing (%) | 75.6% | 77.1% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.5229358 | 3.745098 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 75 | 75 ? |
| Unique (%) | 68.8% | 73.5% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | B49 | B94 |
| 2nd row | E58 | C22 C26 |
| 3rd row | B96 B98 | B22 |
| 4th row | E101 | E46 |
| 5th row | E67 | C68 |
| Value | Count | Frequency (%) |
| b96 | 3 | 2.4% |
| b98 | 3 | 2.4% |
| c23 | 3 | 2.4% |
| c25 | 3 | 2.4% |
| c27 | 3 | 2.4% |
| b49 | 2 | 1.6% |
| e25 | 2 | 1.6% |
| e67 | 2 | 1.6% |
| c92 | 2 | 1.6% |
| g6 | 2 | 1.6% |
| Other values (91) | 100 |
| Value | Count | Frequency (%) |
| f2 | 3 | 2.4% |
| g6 | 3 | 2.4% |
| c23 | 3 | 2.4% |
| c25 | 3 | 2.4% |
| c27 | 3 | 2.4% |
| f | 3 | 2.4% |
| d36 | 2 | 1.6% |
| e101 | 2 | 1.6% |
| b22 | 2 | 1.6% |
| c22 | 2 | 1.6% |
| Other values (89) | 98 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 38 | 9.9% |
| 2 | 34 | 8.9% |
| B | 34 | 8.9% |
| 3 | 33 | 8.6% |
| 1 | 28 | 7.3% |
| 8 | 26 | 6.8% |
| 6 | 23 | 6.0% |
| 4 | 22 | 5.7% |
| E | 21 | 5.5% |
| 5 | 21 | 5.5% |
| Other values (9) | 104 |
| Value | Count | Frequency (%) |
| 2 | 44 | |
| C | 37 | 9.7% |
| 6 | 33 | 8.6% |
| B | 32 | 8.4% |
| 3 | 29 | 7.6% |
| 1 | 26 | 6.8% |
| 22 | 5.8% | |
| E | 20 | 5.2% |
| 4 | 20 | 5.2% |
| 7 | 19 | 5.0% |
| Other values (9) | 100 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 384 |
| Value | Count | Frequency (%) |
| (unknown) | 382 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| C | 38 | 9.9% |
| 2 | 34 | 8.9% |
| B | 34 | 8.9% |
| 3 | 33 | 8.6% |
| 1 | 28 | 7.3% |
| 8 | 26 | 6.8% |
| 6 | 23 | 6.0% |
| 4 | 22 | 5.7% |
| E | 21 | 5.5% |
| 5 | 21 | 5.5% |
| Other values (9) | 104 |
| Value | Count | Frequency (%) |
| 2 | 44 | |
| C | 37 | 9.7% |
| 6 | 33 | 8.6% |
| B | 32 | 8.4% |
| 3 | 29 | 7.6% |
| 1 | 26 | 6.8% |
| 22 | 5.8% | |
| E | 20 | 5.2% |
| 4 | 20 | 5.2% |
| 7 | 19 | 5.0% |
| Other values (9) | 100 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 384 |
| Value | Count | Frequency (%) |
| (unknown) | 382 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| C | 38 | 9.9% |
| 2 | 34 | 8.9% |
| B | 34 | 8.9% |
| 3 | 33 | 8.6% |
| 1 | 28 | 7.3% |
| 8 | 26 | 6.8% |
| 6 | 23 | 6.0% |
| 4 | 22 | 5.7% |
| E | 21 | 5.5% |
| 5 | 21 | 5.5% |
| Other values (9) | 104 |
| Value | Count | Frequency (%) |
| 2 | 44 | |
| C | 37 | 9.7% |
| 6 | 33 | 8.6% |
| B | 32 | 8.4% |
| 3 | 29 | 7.6% |
| 1 | 26 | 6.8% |
| 22 | 5.8% | |
| E | 20 | 5.2% |
| 4 | 20 | 5.2% |
| 7 | 19 | 5.0% |
| Other values (9) | 100 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 384 |
| Value | Count | Frequency (%) |
| (unknown) | 382 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| C | 38 | 9.9% |
| 2 | 34 | 8.9% |
| B | 34 | 8.9% |
| 3 | 33 | 8.6% |
| 1 | 28 | 7.3% |
| 8 | 26 | 6.8% |
| 6 | 23 | 6.0% |
| 4 | 22 | 5.7% |
| E | 21 | 5.5% |
| 5 | 21 | 5.5% |
| Other values (9) | 104 |
| Value | Count | Frequency (%) |
| 2 | 44 | |
| C | 37 | 9.7% |
| 6 | 33 | 8.6% |
| B | 32 | 8.4% |
| 3 | 29 | 7.6% |
| 1 | 26 | 6.8% |
| 22 | 5.8% | |
| E | 20 | 5.2% |
| 4 | 20 | 5.2% |
| 7 | 19 | 5.0% |
| Other values (9) | 100 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 1 |
| Missing (%) | 0.2% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | C | S |
| 3rd row | Q | S |
| 4th row | S | S |
| 5th row | C | S |
Common Values
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 86 | 19.3% |
| Q | 42 | 9.4% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 83 | 18.6% |
| Q | 39 | 8.7% |
| (Missing) | 1 | 0.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 317 | |
| c | 86 | 19.3% |
| q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| s | 323 | |
| c | 83 | 18.7% |
| q | 39 | 8.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 86 | 19.3% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 83 | 18.7% |
| Q | 39 | 8.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 86 | 19.3% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 83 | 18.7% |
| Q | 39 | 8.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 86 | 19.3% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 83 | 18.7% |
| Q | 39 | 8.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 86 | 19.3% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 83 | 18.7% |
| Q | 39 | 8.8% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.126 | 0.166 | -0.257 | 0.054 | 0.292 | 0.190 | -0.175 | 0.160 |
| Embarked | 0.126 | 1.000 | 0.261 | 0.024 | 0.129 | 0.309 | 0.137 | 0.089 | 0.212 |
| Fare | 0.166 | 0.261 | 1.000 | 0.401 | -0.017 | 0.565 | 0.196 | 0.442 | 0.293 |
| Parch | -0.257 | 0.024 | 0.401 | 1.000 | -0.043 | 0.000 | 0.269 | 0.404 | 0.170 |
| PassengerId | 0.054 | 0.129 | -0.017 | -0.043 | 1.000 | 0.041 | 0.148 | -0.093 | 0.089 |
| Pclass | 0.292 | 0.309 | 0.565 | 0.000 | 0.041 | 1.000 | 0.130 | 0.145 | 0.355 |
| Sex | 0.190 | 0.137 | 0.196 | 0.269 | 0.148 | 0.130 | 1.000 | 0.178 | 0.587 |
| SibSp | -0.175 | 0.089 | 0.442 | 0.404 | -0.093 | 0.145 | 0.178 | 1.000 | 0.163 |
| Survived | 0.160 | 0.212 | 0.293 | 0.170 | 0.089 | 0.355 | 0.587 | 0.163 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.058 | 0.132 | -0.228 | 0.024 | 0.285 | 0.055 | -0.239 | 0.186 |
| Embarked | 0.058 | 1.000 | 0.205 | 0.000 | 0.000 | 0.257 | 0.149 | 0.076 | 0.200 |
| Fare | 0.132 | 0.205 | 1.000 | 0.451 | -0.039 | 0.466 | 0.279 | 0.459 | 0.309 |
| Parch | -0.228 | 0.000 | 0.451 | 1.000 | 0.004 | 0.000 | 0.283 | 0.541 | 0.168 |
| PassengerId | 0.024 | 0.000 | -0.039 | 0.004 | 1.000 | 0.054 | 0.050 | -0.046 | 0.125 |
| Pclass | 0.285 | 0.257 | 0.466 | 0.000 | 0.054 | 1.000 | 0.141 | 0.143 | 0.325 |
| Sex | 0.055 | 0.149 | 0.279 | 0.283 | 0.050 | 0.141 | 1.000 | 0.227 | 0.542 |
| SibSp | -0.239 | 0.076 | 0.459 | 0.541 | -0.046 | 0.143 | 0.227 | 1.000 | 0.177 |
| Survived | 0.186 | 0.200 | 0.309 | 0.168 | 0.125 | 0.325 | 0.542 | 0.177 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 149 | 150 | 0 | 2 | Byles, Rev. Thomas Roussel Davids | male | 42.0 | 0 | 0 | 244310 | 13.0000 | NaN | S |
| 306 | 307 | 1 | 1 | Fleming, Miss. Margaret | female | NaN | 0 | 0 | 17421 | 110.8833 | NaN | C |
| 502 | 503 | 0 | 3 | O'Sullivan, Miss. Bridget Mary | female | NaN | 0 | 0 | 330909 | 7.6292 | NaN | Q |
| 281 | 282 | 0 | 3 | Olsson, Mr. Nils Johan Goransson | male | 28.0 | 0 | 0 | 347464 | 7.8542 | NaN | S |
| 484 | 485 | 1 | 1 | Bishop, Mr. Dickinson H | male | 25.0 | 1 | 0 | 11967 | 91.0792 | B49 | C |
| 229 | 230 | 0 | 3 | Lefebre, Miss. Mathilde | female | NaN | 3 | 1 | 4133 | 25.4667 | NaN | S |
| 662 | 663 | 0 | 1 | Colley, Mr. Edward Pomeroy | male | 47.0 | 0 | 0 | 5727 | 25.5875 | E58 | S |
| 435 | 436 | 1 | 1 | Carter, Miss. Lucile Polk | female | 14.0 | 1 | 2 | 113760 | 120.0000 | B96 B98 | S |
| 727 | 728 | 1 | 3 | Mannion, Miss. Margareth | female | NaN | 0 | 0 | 36866 | 7.7375 | NaN | Q |
| 768 | 769 | 0 | 3 | Moran, Mr. Daniel J | male | NaN | 1 | 0 | 371110 | 24.1500 | NaN | Q |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 263 | 264 | 0 | 1 | Harrison, Mr. William | male | 40.0 | 0 | 0 | 112059 | 0.0000 | B94 | S |
| 686 | 687 | 0 | 3 | Panula, Mr. Jaako Arnold | male | 14.0 | 4 | 1 | 3101295 | 39.6875 | NaN | S |
| 498 | 499 | 0 | 1 | Allison, Mrs. Hudson J C (Bessie Waldo Daniels) | female | 25.0 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S |
| 58 | 59 | 1 | 2 | West, Miss. Constance Mirium | female | 5.0 | 1 | 2 | C.A. 34651 | 27.7500 | NaN | S |
| 811 | 812 | 0 | 3 | Lester, Mr. James | male | 39.0 | 0 | 0 | A/4 48871 | 24.1500 | NaN | S |
| 745 | 746 | 0 | 1 | Crosby, Capt. Edward Gifford | male | 70.0 | 1 | 1 | WE/P 5735 | 71.0000 | B22 | S |
| 491 | 492 | 0 | 3 | Windelov, Mr. Einar | male | 21.0 | 0 | 0 | SOTON/OQ 3101317 | 7.2500 | NaN | S |
| 130 | 131 | 0 | 3 | Drazenoic, Mr. Jozef | male | 33.0 | 0 | 0 | 349241 | 7.8958 | NaN | C |
| 837 | 838 | 0 | 3 | Sirota, Mr. Maurice | male | NaN | 0 | 0 | 392092 | 8.0500 | NaN | S |
| 467 | 468 | 0 | 1 | Smart, Mr. John Montgomery | male | 56.0 | 0 | 0 | 113792 | 26.5500 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 718 | 719 | 0 | 3 | McEvoy, Mr. Michael | male | NaN | 0 | 0 | 36568 | 15.5000 | NaN | Q |
| 94 | 95 | 0 | 3 | Coxon, Mr. Daniel | male | 59.0 | 0 | 0 | 364500 | 7.2500 | NaN | S |
| 385 | 386 | 0 | 2 | Davies, Mr. Charles Henry | male | 18.0 | 0 | 0 | S.O.C. 14879 | 73.5000 | NaN | S |
| 220 | 221 | 1 | 3 | Sunderland, Mr. Victor Francis | male | 16.0 | 0 | 0 | SOTON/OQ 392089 | 8.0500 | NaN | S |
| 293 | 294 | 0 | 3 | Haas, Miss. Aloisia | female | 24.0 | 0 | 0 | 349236 | 8.8500 | NaN | S |
| 461 | 462 | 0 | 3 | Morley, Mr. William | male | 34.0 | 0 | 0 | 364506 | 8.0500 | NaN | S |
| 760 | 761 | 0 | 3 | Garfirth, Mr. John | male | NaN | 0 | 0 | 358585 | 14.5000 | NaN | S |
| 452 | 453 | 0 | 1 | Foreman, Mr. Benjamin Laventall | male | 30.0 | 0 | 0 | 113051 | 27.7500 | C111 | C |
| 233 | 234 | 1 | 3 | Asplund, Miss. Lillian Gertrud | female | 5.0 | 4 | 2 | 347077 | 31.3875 | NaN | S |
| 788 | 789 | 1 | 3 | Dean, Master. Bertram Vere | male | 1.0 | 1 | 2 | C.A. 2315 | 20.5750 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 450 | 451 | 0 | 2 | West, Mr. Edwy Arthur | male | 36.0 | 1 | 2 | C.A. 34651 | 27.7500 | NaN | S |
| 200 | 201 | 0 | 3 | Vande Walle, Mr. Nestor Cyriel | male | 28.0 | 0 | 0 | 345770 | 9.5000 | NaN | S |
| 309 | 310 | 1 | 1 | Francatelli, Miss. Laura Mabel | female | 30.0 | 0 | 0 | PC 17485 | 56.9292 | E36 | C |
| 372 | 373 | 0 | 3 | Beavan, Mr. William Thomas | male | 19.0 | 0 | 0 | 323951 | 8.0500 | NaN | S |
| 322 | 323 | 1 | 2 | Slayter, Miss. Hilda Mary | female | 30.0 | 0 | 0 | 234818 | 12.3500 | NaN | Q |
| 502 | 503 | 0 | 3 | O'Sullivan, Miss. Bridget Mary | female | NaN | 0 | 0 | 330909 | 7.6292 | NaN | Q |
| 443 | 444 | 1 | 2 | Reynaldo, Ms. Encarnacion | female | 28.0 | 0 | 0 | 230434 | 13.0000 | NaN | S |
| 444 | 445 | 1 | 3 | Johannesen-Bratthammer, Mr. Bernt | male | NaN | 0 | 0 | 65306 | 8.1125 | NaN | S |
| 838 | 839 | 1 | 3 | Chip, Mr. Chang | male | 32.0 | 0 | 0 | 1601 | 56.4958 | NaN | S |
| 813 | 814 | 0 | 3 | Andersson, Miss. Ebba Iris Alfrida | female | 6.0 | 4 | 2 | 347082 | 31.2750 | NaN | S |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||